Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ndk: Requre all off-thread dyn Fn* callback types to implement Send #455

Merged
merged 1 commit into from
Jan 27, 2024

Conversation

MarijnS95
Copy link
Member

@MarijnS95 MarijnS95 commented Dec 11, 2023

Fixes #454

In all these cases the NDK either documents or implements the callback to be invoked from a single, separate thread. For this to be allowed by Rust thread safety, the closures and their (moved) contents are effectively "moved" to a different thread (Send) but not accessed concurrently (Sync), but the type definitions were never requiring this marker trait which could lead to undefined/invalid behaviour.

In addition some Boxed callbacks may have been dropped before a new callback is passed to the NDK, leading to a possible race condition. By storing the new callback after registering it with the NDK, the previous callback is now dropped later to ensure such race conditions no longer happen (assuming AOSP callback invocations are properly synchronized with the setters).

A request has been filed to solidify this "behavioural contract" in the documentation where not done so yet: https://issuetracker.google.com/issues/300602767#comment12

@MarijnS95
Copy link
Member Author

@spencercw since you've been active around callbacks and seem to be using this crate, would you mind helping me out reviewing this and other PRs? I don't like "forcing" my own work into the codebase without at least one extra pair of eyes.

I've also been wondering why the NDK is mixing both Box<Box<dyn Fn...>> types together with F: Fn generics (and extern "C" fn native_func<F: Fn...>() {}), and whether some APIs should deregister their callbacks in drop() to prevent possible data races.

@spencercw
Copy link
Contributor

Looks like a reasonable change.

I'm not actively using this right now (changed jobs), but don't mind being an extra pair of eyes from time to time.

I've also been wondering why the NDK is mixing both Box<Box<dyn Fn...>> types together with F: Fn generics

This does seem to be a little superfluous. I guess from an API point of view it means the API user doesn't need to box the callback themselves, but would be nice to be consistent across the codebase.

and whether some APIs should deregister their callbacks in drop() to prevent possible data races.

Tricky. I think this is probably not necessary, but it depends on the underlying NDK API. For example, AMediaCodec_setAsyncNotifyCallback says 'Once the callback is unregistered or the codec is reset / released, the previously registered callback will not be called.' So hopefully we can assume that once AMediaCodec_delete has returned, it is safe to drop the corresponding Rust callback state.

@MarijnS95
Copy link
Member Author

I'm not actively using this right now (changed jobs), but don't mind being an extra pair of eyes from time to time.

Thanks! That's unfortunate but I have quite a bunch of PRs open now. @rib appears to be inactive because of the newyears' holidays so I'll wait for that.

I've also been wondering why the NDK is mixing both Box<Box<dyn Fn...>> types together with F: Fn generics

This does seem to be a little superfluous. I guess from an API point of view it means the API user doesn't need to box the callback themselves, but would be nice to be consistent across the codebase.

I thought so too at first, until it hit me: fat pointers. The pointer coming from a Box<dyn Fnxxx> is a fat pointer with the vtable and storage for whatever is moved/referenced inside the closure:

let foo: Box<dyn Fn() -> u32> = Box::new(|| 1337);
let raw = dbg!(Box::into_raw(foo));
dbg!(std::mem::size_of_val(&raw));

Gives:

[src/main.rs:3] Box::into_raw(foo) = 0x0000000000000001
[src/main.rs:4] std::mem::size_of_val(&raw) = 16

(Note that nothing is captured, so there's no data associated with the dyn Fn)

So we'll need to box twice if we wish to pass an 8-byte pointer around, or find another solution such as using generics to convey the closure type (hence the box pointer becomes 8-bytes again, but the callback is also monomorphized).


Tricky. I think this is probably not necessary, but it depends on the underlying NDK API. For example, AMediaCodec_setAsyncNotifyCallback says 'Once the callback is unregistered or the codec is reset / released, the previously registered callback will not be called.' So hopefully we can assume that once AMediaCodec_delete has returned, it is safe to drop the corresponding Rust callback state.

Yeah I've been giving this some thought:

  1. For one-off callbacks (Choreographer seems to have some) we can Box::into_raw() in the callback. In the unlikely even that the callback is never called (there's no way to unregister it, and Choreographer is a "singleton" with a global "instance" and hence no impl Drop), we'll leak a box;
  2. For functions called more often, we'll hold on to the Box and destroy it on drop() (and eventually deregister it if necessary, as discussed before - but let's check docs first);
  3. If there's ever a structure that can be "cloned" (in this crate) via incrementing the NDK refcount, we suddenly no longer have a sensible place to track the "lifetime" of a boxed callback (but we'll investigate/solve that when the case appears, I don't think we have any ATM).

Either way, deregistering callbacks or having an explicit comment that the delete() causes the callback to no longer fire is fine by me; as long as we can safely get rid of the Box and invalidate the pointer.


Then secondly I've been thinking about a general lack of when and where callbacks are called, and what functions/objects/pointers are thread-safe, but that's for another time :)

@MarijnS95
Copy link
Member Author

MarijnS95 commented Jan 6, 2024

Made an attempt to request some clarifications in https://issuetracker.google.com/issues/318944941.

Notice that it may be terrible for us to take an Fn instead of an FnMut. The latter allows interior mutability of captured variables and should not be used on concurrently-called closures AFAIK. However, I think it may be fine on these functions as they're moved to another thread and only invoked serially there.

Likewise I think that FnOnce doesn't require Sync on captured variables, only Send (same as std::thread::spawn()).

Thoughts?

@spencercw
Copy link
Contributor

I think FnMut makes sense for AMediaCodec, with FnOnce for AChoreographer and ASurfaceTransaction. This does have the built-in assumption that these callbacks indeed are only called once, but the NDK documentation being what it is we might just have to live with this and hope at some point they improve it.

@MarijnS95 MarijnS95 mentioned this pull request Jan 9, 2024
@MarijnS95
Copy link
Member Author

@spencercw I ended up opening a new issue #464 to once again go over all callbacks and document their (lack of) standardization, so that we have a guideline how to build our callbacks going forward.

On the topic of Box<Box<dyn Fn>> vs Box<T> with T: Fn, the latter can only be implemented for one-off callbacks that don't store their callback in the owning object.

In all these cases the NDK either documents or implements the callback
to be invoked from a single, separate thread.  For this to be allowed
by Rust thread safety, the closures and their (moved) contents are
effectively  "moved" to a different thread (`Send`) but not accessed
concurrently (`Sync`), but the type definitions were never requiring
this marker trait which could lead to undefined/invalid behaviour.

In addition some `Box`ed callbacks may have been dropped before a new
callback is passed to the NDK, leading to a possible race condition.
By storing the new callback _after_ registering it with the NDK, the
previous callback is now dropped later to ensure such race conditions
no longer happen (assuming AOSP callback invocations are properly
synchronized with the setters).

Multiple requests have been filed to solidify this "behavioural
contract" in the documentation where not done so yet:
https://issuetracker.google.com/issues/300602767#comment12
https://issuetracker.google.com/issues/318944941
@MarijnS95 MarijnS95 merged commit 0b9c573 into master Jan 27, 2024
38 checks passed
@MarijnS95 MarijnS95 deleted the callback-send branch January 27, 2024 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact: breaking API/ABI-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MediaCodec/ImageReader callbacks not Send
2 participants